JEFS 99 - Capstone(Python)
Loading...

Spark Logo Tiny Revenue Insights of a Sports Retailer

This is a Capstone project, which is design to validate your learnings from the Just Enough Scala for Spark Course.

This Capstone presents you with an opportunity, where you can apply all your learnings including implmentation of classes, methods, exception handling, collections and functional programming to a large problem.

Spark Logo Tiny Problem Description

A sports retailer computes revenue performance of a company every end of the year.

The revenue related information is available in comma separated value format. All quarterly revenue information is available as a string, where each substring separated by a ; contains information about a specific product category in a quarter (in a financial year).

Each substring contains following information as comma separated values.

  • which quarter of the year (Q1/Q2/Q3/Q4 etc.)
  • name of the product category
  • total revenue generated in the financial quarter.

A sample data string is shown below.


 "Q1-2018   Exercise_Fitness    10.33, 
 Q1-2018   Outdoor_Play_Equipment  7.85, 
 Q1-2018   Winter_Sports   3.45"

Where each substring separated by ; is considered a record. In the subsequent section, all references to record means this substring (e.g. Q1-2018,Exercise_Fitness,10.33).

This capstone asks the participants to develop the module in 8 steps.

  1. Step 1: Initialize the string
  2. Step 2: Parse and Validate a record (substring) to figure out if the substring is properly formatted or not.
  3. Step 3: Define a method to filter good and bad records from the complete string.
  4. Step 4: Define a case class to represent a good record and convert all good records to an array of case classes
  5. Step 5: Create a Class that encapsulates all good records and provides methods to calculate insights
    • the total revenue generated by the sports company
    • which takes a product category and returns the total revenue generated

Spark Logo Tiny Step 1: Initialize the String

The string that contains the data is given below. Store the string in a variable named RevenueInfo.

 "Q1-2018,Exercise_Fitness,10.33; 
  Q1-2018,Outdoor_Play_Equipment,7.85; 
  Q1-2018,Winter_Sports,3.45;
  Q2-2018,Exercise_Fitness,7.63; 
  Q2-2018,Outdoor_Play_Equipment,5.05; 
  Q2-2018,Winter_Sports,-;
  Q3-2018,Exercise_Fitness,1.31; 
  Q3-2018,Outdoor_Play_Equipment,3.95; 
  Q3-2018,Winter_Sports,1.50;
  Q4-2018,Exercise_Fitness,5.71; 
  Q4-2018,Outdoor_Play_Equipment,6.52; 
  Q4-2018,Winter_Sports,4.15"
#ANSWERS
 
RevenueInfo = '''Q1-2018,Exercise_Fitness,10.33; 
                 Q1-2018,Outdoor_Play_Equipment,7.85; 
                 Q1-2018,Winter_Sports,3.45;
                 Q2-2018,Exercise_Fitness,7.63; 
                 Q2-2018,Outdoor_Play_Equipment,5.05; 
                 Q2-2018,Winter_Sports,-;
                 Q3-2018,Exercise_Fitness,1.31; 
                 Q3-2018,Outdoor_Play_Equipment,3.95; 
                 Q3-2018,Winter_Sports,1.50;
                 Q4-2018,Exercise_Fitness,5.71; 
                 Q4-2018,Outdoor_Play_Equipment,6.52; 
                 Q4-2018,Winter_Sports,4.15'''

Spark Logo Tiny Step 2: Parse and Validate a record

In this step, validate a substring that represents a record e.g. "Q1-2018,Exercise_Fitness,10.33"

  1. Define a method validateRecord, which takes a string named recStr (representing a record)
  2. Parse the string by comma
  3. Validate that the string contains 3 fields
    • If there are less than 3 fields then throw an exception
  4. Validate if the thrid field is numeric or not (Note: try type casting to float)
    • If not numeric then throw an exception
  5. Surround the complete implementation by try/except to handle exceptions
  6. If the record is valid (i.e. passes both the checks),

    • then return a tuple with tag (GOOD) and the record string as ("GOOD", recStr)
    • if the record is not valid then return a tuple with tag (BAD) and the record string as ("BAD", recStr)

    Note: recStr is the original string passed to the method

# ANSWERS
 
def validateRecord(recStr):
  
  # Split the string by comma
  fields = recStr.split(",")
 
  # Wrap the code around try/except to handle exceptions
  try:
    # If the number of fields in the string is less than 3, then throw an exception  
    if len(fields) < 3:
      raise  Exception("Expected 3 fields. Found only ", len(fields))
      
    # Convert the third field (revenue) into float type  
    revenue = float(fields[2])
    # If we have reached here without any problem, then it is a good rec. Tag the record GOOD and return
    return ("GOOD", recStr)
    
  # If we have reached here then it is a bad rec. Tag the record BAD and return  
  except Exception as ex: 
    return ("BAD", recStr)
# TEST - Run this cell to test your solution.
 
test1Str = validateRecord("Q1-2018,Exercise_Fitness,10.13")
test1StrExpected = ("GOOD","Q1-2018,Exercise_Fitness,10.13")
 
test2Str = validateRecord("Q1-2018,Exercise_Fitness,as")
test2StrExpected = ("BAD","Q1-2018,Exercise_Fitness,as")
 
test3Str = validateRecord("Q1-2018,Exercise_Fitness")
test3StrExpected = ("BAD","Q1-2018,Exercise_Fitness")
 
assert test1Str == test1StrExpected, "Expected the total to be " + str(test1StrExpected) + " but found " + str(test1Str)
assert test2Str == test2StrExpected, "Expected the total to be " + str(test2StrExpected) + " but found " + str(test2Str)
assert test3Str == test3StrExpected, "Expected the total to be " + str(test3StrExpected) + " but found " + str(test3Str)